Optimal data processing for quantum measurements 
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We consider the general measurement scenario in which the ensemble average of an operator is 
determined via suitable data-processing of the outcomes of a quantum measurement described by a 
POVM. We determine the optimal processing that minimizes the statistical error of the estimation. 
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A measurement in Quantum Mechanics is usually asso- 
ciated to an observable represented by a sclfadjoint oper- 
ator X on the Hilbert space TL of the quantum system [Q , 
with the eigenvalues Xi defining the possible outcomes of 
the measurement. The probability distribution of the ith 
outcome is given by the Born rule 



p(i\p) = Tr[P iP ] 



(1) 



p being the density operator of the state and Pi denoting 
the orthogonal projectors in the spectral decomposition 
X = JV =1 x iPi (f° r the sake of illustration here we con- 
sider only finite spectrum). Consequently, the expected 
value for the outcome-averaging over repeated measure- 
ments is given by the ensemble average {X) = Tr[pX], 
with statistical error proportional to the r.m.s. \J (AX 2 ), 
with AX 2 := X 2 - (X) 2 . 

There are, however, more general kinds of measure- 
ments that can be performed in the lab, which are not 
necessarily associated to any observable, nevertheless en- 
able the experimental determination of ensemble aver- 
ages: these are the measurements that are described by 
POVM's. A POVM (acronym for Positive Operator- 
Valued Measure) is a set of (generally nonorthogonal) 
positive operators Pi ^ 0, 1 ^ i ^ N which resolve the 
identity Yli=i Pi = I similarly to the orthogonal projec- 
tors of an observable, whence with the same Born rule 
(|l|). This more general class of quantum measurements 
includes also the description of optimal joint measure- 
ments of non-commuting observables |^|, 0], along with 
the measurements of parameters with no corresponding 
observable such as the phase of a harmonic oscillator [Q , 
and many other practical measurements such as opti- 
mized discrimination of states for quantum communica- 
tions jjj, and, most interesting, the so-called informa- 
tionally complete measurements i. e. measurements 
that allow to determine the density matrix of the state or 
any other desired ensemble average, as for the so-called 
Quantum Tomography Qj. Moreover POVM's also allow 
to provide a full description of the measurement appa- 
ratus, including noisy channels before detection ||. The 
POVM's are not just a theoretical tool, since there is a 
general quantum calibration procedure in order to deter- 
mine experimentally the POVM of a measurement device 
by using a reliable standard 0. 

How can we experimentally determine the ensemble 
average of the (generally complex) operator X using a 
POVM? Clearly this is possible if X can be expanded 



over the POVM elements (mathematically we denote this 
condition as X £ Span{Pi}i =lj jv- This means that there 
exists a set of coefficients fi[X] such that 



X = 



N 

E 



fi[X]P h XeS:= Span{Pj l=1 



N 



(2) 



When S = B(Tt) (i. e. when all operators can be ex- 
panded over the POVM), then the measurement is infor- 
mationally complete. Obviously, once the expansion (|^) 
is established one can obtain the ensemble average of X 
by the following averaging 



N 

(x) = Y J mw\ P ), 

i=l 



(3) 



where the probability distribution is given in Eq. ([!]) . 

The above general measurement procedure opens the 
problem of finding the coefficients fi[X] in Eq. (j|), 
namely the data-processing of the measurement out- 
comes needed to determine the ensemble average of 
X. In general the coefficients fi[X] are not unique (if 
N > dim(<S)), and one then wants to optimize the data- 
processing according to a practical criterion, typically 
minimizing the statistical error. This problem has never 
been addressed in the general case, and its solution will 
be presented in this Letter. Notice that although the pro- 
cessing functions are intrinsically linear in the definition 
(||) , there is no guarantee that the optimal ones are linear 
in X. However, as we will see, remarkably the optimal 
processing function is indeed linear in X, and depends 
only on the POVM and, in a Bayesian scheme, on the 
ensemble of possible input states (due to the simplicity 
and popularity of the Bayesian scheme, in this letter we 
will restrict the analysis only to this scheme, postpon- 
ing the analysis of the minimax strategy to another more 
technical publication: for a comparison between the two 
frameworks see, for example, Ref. |ll|). The derivation 
of the optimal data-processing function requires some no- 
tions of frame theory jn], |l2| and linear algebra, which 
will be introduced in the first part of the letter. Actually, 
for simplicity, instead of presenting the actual derivation 
we will first prove uniqueness of the optimal processing, 
then we present the result and prove that it satisfies the 
equations for optimality. At the end we will also consider 
a simple example of application for the sake of a quanti- 
tative estimation, showing that the optimization can lead 
to sensible improvements. 
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In the Bayesian scheme one has an a priori ensemble 
£ := {piiPi} of possible states pi of the quantum system 
occurring with probability Pi. 

For finite dimension all bounded operators are Hilbert 
Schmidt, whence S is a Hilbert space, and indeed S C 
TL® 2 and linear operators can be associated to bipartite 
vectors as follows [Ol 



A= A mn \m)(n\ <-> \A)) = 



d 

E 

m,n— 1 



A mn \m)\n), (4) 



with the Hilbert-Schmidt scalar product ((A\B)) = 
Tr[A^B]. In the following we will retain the double- 
kct notation as a remind of the correspondence (^). 
Completeness of the set of vectors {|-Pj))}i<i<JV with 
S := Span{|Pi))}i<g^jv can be written as follows 

N 

a\X\l^^\({Pi\X))\ 2 ^b\X\l XeS. (5) 
i=i 

with < a ^ b < oo, and the norm \\Z\\2 is the 
Hilbert-Schmidt norm induced by the scalar product 
\\Z\\ 2 = y/((Z]Z)) = y/Tr[ZlZ]. In the literature Eq. 
(|J) with | Pi}} regarded as abstract vectors in the linear 
space S Q] define a so-called frame of vectors. The main 
theorem of frame theory states that a set of vectors in S 
is a frame iff the operator 



F 



mi 



(6) 



called frame operator is invertiblc (here the fact that 
the set {|Pi})}i^i^Ar is a frame for S trivially follows from 
the definition of S. Since F is invertible, one can obtain 
suitable coefficients fi[X] for the expansion of a vector 
|X}} by the formula 



fi[X] = ((XIX)) 



(J) 



where {A^} is the canonical dual JTlJ , which is defined 
through the identity 



\A i ))=F- 1 \Pj). 



(8) 



However, if the vectors {\P))}i^i^N are linearly depen- 
dent, the processing rule (Q) is not unique, and all dif- 
ferent choices of coefficients are provided by fi[X] = 
((Di\X)), with {Di} are alternate duals. All alternate 
duals can be classified as follows Jlal 



\Di 



IA,; 



\Yi) 



E rc: 



mi A, 



(9) 



where the operators {5^} are arbitrary elements of S. 
Now, one can define a linear map A from an abstract 
^-dimensional space JC of coefficient vectors \c) to S as 
follows 



N 



A| c } = 5>|p; 



(10) 



and A has matrix elements A mn .j = (Pi) mn . By defini- 
tion any alternate dual must satisfy 

N d N 

y , ^ t {Pj)pg(D j)mn{Pi)mn c i — ^ ] (Pj)pq c i, (H) 
— 1 m,n=l i—1 

for all \c) G JC. Defining the matrix T with elements 
(r%-, m „ = {D*) mn one has 

ArA = A, (12) 

which is the definition of generalized inverse (or pseu- 
doinverse) of A. Alternate duals are then in one-to-one 
correspondence with generalized inverses of A. This fact 
was already noticed in Ref. jl6), and will be very useful 
in the following. 

We want now to minimize the statistical error in the 
determination of the ensemble average. This is provided 
by the variance 

N 

S D (X) :=Yp(j\ps)\fAX]\ 2 -WWE, (13) 



where p £ = Y,iP*Pii and \{ x )\ 2 £ = E 4 -Pil Tr [P^]l is 
the squared modulus of the expectation of X averaged 
over the states in the ensemble. One has 



N 



S D (X) = Y,\mX))\ 2 TT[p £ P t }-\(X)\'- 



(14) 



Notice that the term |(A}| 2 £ depends only on the en- 
semble, and is independent of the POVM, whence we 
will focus attention only on the contribution 



X D (X) 



N 

E 



miX^TripsP] 



(15) 



A relevant case is that of the uniform ensemble, with all 
pure states equally distributed, corresponding to p £ = 4 
and \{XW £ = ^(Tr[AtA] + |Tr[A]| 2 ) @. 

Eq. ( |l5| ) defines a norm of the vector of 

coefficients corresponding to the metric matrix 7Ty = 
Tr[p£Pi]5ij. Then, minimizing £d(X) corresponds to de- 
termining the minimum norm generalized inverse T of A 
with respect to the norm |c|^ = J2i=i \ci\ 2 ^n. The mini- 
mum norm condition for n = I corresponds to the Moore- 
Penrose generalized inverse T [[l7|, satisfying the three 
conditions: TAr = T, TA = AtH and Ar = r+At. The 
Moore-Penrose generalized inverse of a matrix Z (also 
denoted as Z$) turns out to be simply the inverse of Z 
on its support Supp(Z) (the support Supp(Z) of Z is the 
orthogonal complement of the kernel Ker(Z) of Z), and 
acts as the null matrix on Ker(Z). 

Following the same lines of derivation for the Moore- 
Penrose generalized inverse one can show that the mini- 
mum norm generalized inverse for a generic tt is indepen- 
dent of A, and is defined by the condition ]Tr| 

TrrA = AtrV (16) 



3 



The matrix TA has matrix elements (rA)jj = ((Di\Pj)). 
Eq. ( [l6l ) rewritten in terms of the optimal dual 
becomes 



((p £ \P i ))((D i \P j )) = ((P i \D j ))((PAps))- 



(17) 



Upon summing over the index i, and remembering that 
for any dual {7>i} one has |Pj)) ((£>.; | = lis where II5 is 
the projection on <S, one has {(pe\Pj)) — ^[Dj^P^ps)) , 
consequently Tr[7>i] = 1. This implies that the optimal 
processing function for the identity operator is fi[I] = 1, 
whence 5^(1) = 0, whereas, remarkably, fi[I] is generally 
non constant for the canonical dual. 

We will now prove that the solution of Eq. (|l7J) is 
unique. For not invertible tt we can restrict Eq. ( |l6| ) 
to Supp(7r), and from now on we will denote the corre- 
sponding blocks of all matrices with the same symbols. 
Suppose now that there exist two generalized inverses T 
and r' satisfying Eq. (|l^). Upon defining = T — V , we 
have that 



A6A = 

7t9A = Atev 



(18) 



and multiplying on the left by An" 1 both members of 
the second equation, and substituting the first equation 
we obtain A6A = Att^^-A^Q^tt = 0, or equivalently, by 
invertibility of tt, A7r _1 A t 9 t = 0. The matrix An~ 1 A^ 
can be rewritten as 



N 

A^At^^r 1 ^}}^! 

i=l 



(19) 



Since Air 1 A 1( ^ 0, a sufficient condition for a vector 
X e S to be in Ker(A 7 r- 1 At) is that ((X\Att- 1 A^\X)) = 
0, namely 



N 

E 



(TTny^dXlP^l 2 = 0, XeS, 



(20) 



which is possible iff ((X\Pi)) = for all i. By complete- 
ness of Pi, this is equivalent to say that the only vector 
of S in Ker(A7r -1 A) is X = 0. Then Att^A is full rank, 
whence 9 = 0, or equivalently T = V . 

We will now provide the solution to Eq. ([l6|) in terms 
of the optimal dual, which is expressed as 



N 



A = A, - 5^([(7 - M)tt(7 - M)]$irM)ijAj 



(21) 



i=i 



where A, is the canonical dual, M^j — ((A,|Pj)) = 
m\F-i\P 3 )) = <(i>.|A,)) = A/;. Since A] = A,, M*. = 
Mij Q and the optimal dual frame {Di} in Eq. pi]) is 
selfadjoint because the matrix [(7 — M)tt(I — M)\hrM 
has real elements. Notice that M 2 — M and = M, 
namely M is an orthogonal projector, as can be easily 



verified. Also (I — M) is an orthogonal projector, and 
[(7 - M)tx(I - A7)]*(7 - M) = [(7-M)7r(7-M)]*. The 
matrix TA for the optimal dual frame can be easily cal- 
culated, and is equal to 



TA = M — [(/ - M)tt(7 - M)]*ttM. 



(22) 



We can substitute this expression in Eq. (|T^) to verify its 
validity. We have indeed 



(23) 



TrrA = nM - ?r[(7 - M)?r(7 - M)]*ttM 
=ttM + tt[(7 - M)tt(7 - M)]*(7 - M)it{I - M) 

- tt[(7 - M)tt(7 - M)]*7T 
=tt - tt[(7 - M)tt(7 - M)]*7T, 

and analogously 

Atrt TT = TT — TT [(7 - M)tt(7 - M)]*tt. (24) 

When tt oc I the canonical dual is optimal, since for the 
canonical dual one has TA = A^T^. This is the case 
e.g. of the uniform ensemble of pure states with POVM 
elements with constant trace, which includes all covariant 
POVMs studied in Ref. |l6j] . In the general case, one can 
write the expression of Eq. (J15|) as follows 



X f) (X) = Z A (X)-*(X), (25) 
where Ea is the contribution of the canonical dual 



N 



E A (X)=^|((A i |JC))| 2 Tr[p £ P,], 



(26) 



and ^> is the correction due to the optimization which is 
given by 



N 



9(X) = ((X|A i ))( 7 r[(7-M)7r(7-M)]t 7 r) ij -«A i |X)). 

(27) 

The relative added noise of the canonical dual compared 
to the optimal one is given by 



e(X) 



S A (X)-S f) (X) 



9(X) 



^D(X) S A (X)-*(X)-|(X)|2 f 

(28) 

A quantitative estimate of e(X) can be obtained from 
the following example in dimension two (see Fig. ^). 
Consider the POVM 
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FIG. 1: Example of optimized data-processing rule for the 
informationally complete POVM in Eq. (p9|). The plot shows 
the relative added noise in Eq. (^) for X = a z + xa x + ya y 
versus x and y 



The operator X is the following selfadjoint operator 

x = i , ;, . — : 1 ' ] . (30) 



i 

-2.24 



-2.24 + i 
i -1 



and for an ensemble of uniformly distributed pure states 
i(Tr[X 2 ] +Tr[X} 2 ) = ±Tr[V 2 ] = 2.34. By direct cal- 
culation one obtains £a = 799.66 and 'J = 133.05, and 
finally 



e(X) ~ 0.2, 



(31) 



which means a relative added noise of about 20%. This 
example shows that a correct processing can highly im- 
prove the statistics of expectation values, and eventually 
the convergence rate of tomographic state reconstruction. 
The additional error due to the use of the canonical dual 
instead of the optimal one is equivalent to a depolarizing 
channel with depolarization probability 0.09. 

In conclusion, we considered the general measurement 
scenario in which the ensemble average of an operator is 
determined via suitable data-processing of the outcomes 
of a quantum measurement described by a POVM. We 
have determined the optimal processing that minimizes 
the statistical error of the estimation. Contrarily to 
the widespread conviction, the optimal data-processing 
is generally not obtained via the canonical dual of the 
POVM, and the improvement due to optimization can 
be substantial. The present analysis has been carried 
out for finite spectrum and finite dimensions, however, it 
can be easily generalized to discrete spectrum in infinite 
dimensions for bounded operators and bounded duals, 
and, with more technicalities, even to continuous spec- 
trum (the case of quantum homodyne tomography 0). 
We believe that the present result will allow to improve 
greatly many relevant experimental analysis of quantum 
measurements. 
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Notice that since the swap operator E acts on a vector 
\A}) e C(H) as E\A)) = \A T }), where A T is the transpose 
of A on the basis \n) of Eq. (Q), by selfadjointness of 
P % one has E\P*)) = \P t )), and EF* E = F. Similarly 
and then 



£|A*)> = EF-^IP;)} = F-'EIP*)} = | A,)), 



namely A\ 
TrlAiPj] G I 



Aj. As a consequence ((Aj[Pj 



